Preliminary assessment of three quantitative approaches for estimating time-since-deposition from autofluorescence and morphological profiles of cell populations from forensic biological samples

Determining when DNA recovered from a crime scene transferred from its biological source, i.e., a sample’s ‘time-since-deposition’ (TSD), can provide critical context for biological evidence. Yet, there remains no analytical techniques for TSD that are validated for forensic casework. In this study, we investigate whether morphological and autofluorescence measurements of forensically-relevant cell populations generated with Imaging Flow Cytometry (IFC) can be used to predict the TSD of ‘touch’ or trace biological samples. To this end, three different prediction frameworks for estimating the number of day(s) for TSD were evaluated: the elastic net, gradient boosting machines (GBM), and generalized linear mixed model (GLMM) LASSO. Additionally, we transformed these continuous predictions into a series of binary classifiers to evaluate the potential utility for forensic casework. Results showed that GBM and GLMM-LASSO showed the highest accuracy, with mean absolute error estimates in a hold-out test set of 29 and 21 days, respectively. Binary classifiers for these models correctly binned 94–96% and 98–99% of the age estimates as over/under 7 or 180 days, respectively. This suggests that predicted TSD using IFC measurements coupled to one or, possibly, a combination binary classification decision rules, may provide probative information for trace biological samples encountered during forensic casework.


Introduction
An ongoing priority for forensic caseworking agencies has been developing signatures that can associate a time-since-deposition (TSD) with DNA evidence [1].This can help determine whether a person of interest's DNA was transferred within or outside of the time frame of the crime, which in turn informs its relevance.Many methods have been proposed for TSD, including Raman spectroscopy [2], infrared spectroscopy [3], mRNA signatures [4][5][6], bacterial profiling [7] and colorimetric assays [8].While promising, many of these techniques have been demonstrated on a limited range of TSDs (e.g., samples less than a month old).Further, the overwhelming majority of previously described methods have been applied to blood and/ or saliva cells and have not been tested on 'touch' or trace biological samples which largely consist of keratinized epidermal cells and often comprise the majority of samples processed by DNA caseworking units.
One promising approach for determining TSD of trace biological samples focuses on Imaging Flow Cytometry (IFC) characterizations of epidermal cells recovered from the sample.Previous research has shown that morphological and/or autofluorescence profiles of cells derived from either buccal tissue or shed epidermal deposits change over time in mock evidentiary samples [9,10].Further, for shed epidermal cells, IFC profiles can be used to build predictive linear models that provide a probative time interval for sample deposition, e.g., whether the sample was deposited less than a week before collection, deposited between one week and two months, or deposited more than two months [10].
Using this as a foundation, the goal of this study was to test a different quantitative approach, based on continuous models in machine learning (ML), for predicting TSD from autofluorescence/morphological profiles of touch/trace cell populations.Continuous regression models have the potential to provide better resolved estimates of TSD compared to the categorical time intervals, e.g., TSD estimate of four days versus TSD estimate of less than a week.Previous work has applied ML to prediction problems in forensic science, including recent applications predicting tissue age and parameter selection for fluorescent molecular topography [11,12].ML has also been applied to IFC measurements to classify white blood cell types [13] as well as red blood cell types [14], to differentiate cancer cells from blood cells [15], and to predict gene expression from blood cells [16].To our knowledge, ML has never been applied to estimate TSD in epithelial cells from touch samples using IFC measurements.
While many forms of machine learning models could be applied to this prediction problem, of particular interest are those featuring an interpretable framework with the potential to yield biological insight into the mechanism by which measured features of collected cells change according to age.We chose to employ three distinct ML approaches: the elastic net using both the LASSO and the ridge regression penalties, gradient boosting machines (GBM), and generalized linear mixed model (GLMM) LASSO.The elastic net and GLMM models provide the greatest level of model interpretability while the GBM approach is more opaque but often provides superior prediction.

Sample collection
Shed epidermal cell samples derived from the palmar surface and fingers were collected from an existing registry of biological samples between March 1, 2020 and December 1, 2021 from participants following protocols approved by the Virginia Commonwealth University Institutional Review Board (VCU-IRB), protocol #HM20000454_CR9, and analyzed between August 1, 2020 and April 1, 2023.Written informed consent was obtained from all participants in this study.Epidermal samples were collected from 15 different individuals with some donors contributing multiple samples representing different time-since-depositions.After sample collection, the Principal Investigator (PI) de-identified the data and such data was used for all subsequent analyses and no other author had access to potentially identifying information.This yielded a total of 47 different samples collected for the study.Sample deposition involved each donor holding a 50mL conical tube (Olympus Plastics, 28-108) using both hands for approximately three minutes.Tubes were then left at room temperature for designated periods of time ranging between 1-415 days.Because an existing sample registry was used and to capture a wide range of possible TSDs, identical time points were not captured across all of the donors.
Epidermal cells were collected from the substrate with two sterile cotton swabs (Puritan Inc., P/N 25-806 2WC), one prewetted with deionized water and the second swab dry.Cells were eluted off the swab by agitating 1.5mL of sterile 1xPhosphate Buffered Saline (PBS) for ~10 minutes using a vortex.The resulting solution was filtered through 100μm filter paper into 1.5mL centrifuge tubes, and the samples were centrifuged at 21130xg for 10 minutes.The supernatant was pipetted from each tube until the remaining volume was between 50μL and 75μL.The pellet was resuspended by vortex agitation and stored at room temperature before flow cytometry.
Morphological and autofluorescence measurements were extracted from cell populations by importing primary data files generated from IFC (i.e., 'raw image files';.rif)into IDEAS 6.2 Software (Luminex Inc; Austin, TX, USA).Next, intact cells were identified and differentiated from cell fragments, debris, or other non-biological material by selecting detected 'events' with areas greater than 1000μm and aspect ratios over 0.4 using Area_M04 x Aspect Ratio_M04 measurement variables respectively.The subpopulation of cell images was further filtered for in-focus images selecting cells with Gradient RMS values over 50 in the brightfield channel (i.e., non fluorescence channel).
Once each subpopulation were defined, data was extracted from individual cells using 14 different categories of morphological and autofluorescence measurements including area, aspect ratio, contrast, intensity, mean pixel, brightness detail intensity ('R3' pixel increment), brightness detail intensity ('R7' pixel increment), and compactness.Additionally, the ratio of Intensity and Brightness Detail Intensity measurements were calculated across several detector channels including Ch3:Ch1 (Brightness Detail Intensity R3), Ch3:Ch1 (Brightness Detail Intensity R7), Ch3:Ch1 (Intensity), Ch5:Ch1 (Intensity), Ch6:Ch1 (Intensity), and Ch6:Ch3 (Intensity).These yielded a total of 97 measurement variables for each detected cell.This primary data may be found under the title, "Imaging Flow Cytometry (IFC) dataset for cell populations recovered from 'touch'/trace biological samples" at DOI: 10.6084/m9.figshare.22652344 and an example of an IFC image is given in S1 Fig.
The observed cell dataset contained 97 measurements captured with IFC from each of 10,221 detected cell events, or "observations" (i.e., biological particles meeting the size, shape, and contrast criteria described above), across all the collected donor deposits.Samples were collected from 15 donors across multiple time points.In total, 47 different donor/timepoints (hereafter referred to as "samples") are represented in the data.The number of timepoints samples varied by donor, with a maximum of 9 and a median count of 2. Table 1 summarizes the number of observations collected per sample, at each timepoint.Minimum number of observations per donor/timepoint was 3, and maximum was 2,151, with a median of 78 observations.Observations were taken between 1 and 415 days since deposition, with a median of

Machine learning models
We applied 3 separate supervised machine learning paradigms to predict the time since sample deposition in ln(days).First, we applied the elastic net under two conditions, with the alpha mixing parameter at 1 for the Least Absolute Shrinkage and Selection Operator (LASSO) fit and with alpha at 0 for ridge regression [17].Second, we applied Gradient Boosting Machines (GBM) [18], and third, we applied the generalized linear mixed model LASSO [19] which accounts for correlation among touch samples taken from the same donor.Data were analyzed in R (version 4.1.1)using the glmnet (version 4.1.4)[20], gbm (version 2.1.8)[21], and glmmLasso (version 1.6.2) [19] packages.All scripts used for data processing and analysis have been deposited into a public GitHub repository available here.[https://github.com/AEGentry/ForensicScience_Public] Each of our applied ML approaches required cross-validation to select the optimal tuning and/or hyperparameters.After cross-validation identified the optimal model for each approach, a final model was fit to the training data and assessed.For the LASSO and ridge regression, we fit the models to the training data using 10-fold cross validation to choose the shrinkage parameter according to the built-in cross-validation functionality of the cv.glmnet command.The GBM fitting utilized 10-fold cross-validation to tune the model and assess a grid of potential hyperparameters.We tested potential hyperparameters for shrinkage (0.001, 0.01, and 0.05), interaction depth (1 and 2), and minimum observations per node (5, 10, and 15).We chose the optimal GBM model as the one which produced the minimum average mean squared error (MSE) across the folds.Once the optimal tuning parameters were chosen, the final model fits the training data using the recommended 0.5 bag fraction and chooses the optimal number of trees (ie., iterations) via 10-fold cross-validation.For the GLMM models, the glmmLasso package does not include built-in fitting via cross-validation, so we fit the model to the training data using 10-fold cross-validation to choose the shrinkage parameter across values ranging from 5 to 100, in increments of 5. A Gaussian link function was used for the GLMM model and a normally distributed random intercept to account for the correlation between samples from the same donor.This analysis framework is described in a flowchart in Fig 1.

Assessing performance
To assess prediction performance and compare predictions across models, we calculated mean squared error (MSE) and mean absolute error (MAE) between the observed and predicted days since deposition, both on the ln scale and the original scale, within the test set.Additionally, for a secondary, statistical sensitivity analysis, we used N-fold cross-validation whereby we applied each ML model to a subset of the training data 47 times, each time holding out one of the 47 sample cell populations and then used the hold-out donor/timepoint cell population for prediction.These donor/timepoint sample populations are described in S1 Table .We also calculated MSE and MAE for the predictions within each of these donor/timepoint hold-out sets and summarized the means of these measures across the 47 hold out sets.
Each of the ML approaches predicts time since deposition as a continuous value (either ln (days) or days).To demonstrate the potential real-world usefulness of this prediction paradigm, we categorized these continuous predictions, both in the test set and within the N-fold cross-validation sample hold-out sets.We categorized the predictions using six binary schemes for time since deposition: less than or greater than 7 days, 30 days, 60 days, 90 days, 120 days, or 180 days, and summarized the performance of each using the proportion of predictions properly categorized by each binary cutoff.

Model fitting
The full dataset containing 10,221 observations was divided into a training set with 6,414 observations, or approximately 62.8% of the data, and a test set with 3,807 observations, or approximately 37.8% of the data.The lambda penalty term minimizing cross-validation error was 0.00081 and 0.05997 in the LASSO and ridge models, respectively; the LASSO model retained 83 predictors with non-zero coefficients.The GBM hyperparameters achieving best fit utilized shrinkage (also known as "learning rate" or "step size reduction") = 0.05, a minimum number of observations per node of 15, interaction depth (also known as "tree depth") of 2, and an optimal number of trees of 5786.For the GLMM model, the lambda penalty term minimizing cross-validation error was 90 and that model retained 94 predictors with non-zero coefficients.

Prediction performance for estimating TSD in days
Overall MAE for ln(days) and the original scale days for all four models are detailed in Table 2 for the test sets.MSE and training set error are reported in S2 Table .Errors on the original scale were estimated by transforming the predicted ln(days) to the original scale and calculating MSE and MAE on the transformed values.Optimal models were fit using MSE of ln(days) internal to the ML algorithms, as this loss function optimizes prediction, while MAE in raw days is useful for reporting performance as it is the more readily interpretable, intuitive metric.Within the training data, the GBM model outperformed all other models, both in MSE and MAE, although performance in the GLMM model was similar (e.g., 17.2 MAE days for GBM and 21.8 MAE days for GLMM).In the hold-out test set, the GLMM model outperformed the others, with a MSE of 0.18 and a MAE of 0.31, while the GBM model performed nearly as well, with a MSE of 0.33 and a MAE of 0.41, as measured in ln(days).
In the sensitivity analysis that used N-fold cross-validation for each donor cell population (Table 3 and S3 Table), the GBM model again outperformed all other models with average MSE of 1.28 and MAE of 0.82, while the GLMM model achieved average MSE of 1.41 and MAE of 0.89.Figs 2 and 3 illustrate the absolute value of the prediction error in days for the GBM and GLMM models in the hold-out test set, respectively.The median MAE was 17.3 (GBM) and 12.8 (GLMM) days with third quartiles only reaching 37.8 (GBM) and 25.7 (GLMM) days.Notably, while the absolute error does increase as the true TSD increases, the majority of samples, particularly those that were less than two months old, error was extremely low.This trend is not unexpected given the structural and biochemical heterogeneity of shed epidermal cells within touch/trace samples [22].Indeed, given the established variability within an epidermal cell population deposited by one person at a single timepoint, it is not unreasonable to expect that morphological and/or autofluorescence changes occurring as the sample ages to increasingly diverge over time, and for subsets of the cells to change at different rates.

Prediction performance for estimating TSD with binary time intervals
TSD estimates from each regression model were also used to classify each donor cell population into one of two time intervals that span the entire observation period for this study (i.e., between 0 and 415 days).To further explore the potential utility for forensic casework of TSD estimates, we chose a series of binary cutoff points to use as classifiers.Using this approach, we simply transformed the model predicted TSD in days and calculated the proportion of these estimates which correctly classified the TSD as older or newer than a given cutoff.We did not conduct any new classifying procedures, rather this reporting simply represents a categorization of the continuous prediction.
The proportions of observations properly classified at different timepoints according to the series of binary cutoffs are shown in aggregate for the test set and the donor/timepoint set in Table 4, with the Total column showing the number of cells with true TSD at or below the binary cutoff in the test and donor/timepoint holdout sets.In the test set, the GBM and GLMM models performed similarly, with the models correctly classifying observations between 81-99% of the time.Performance was similar in the donor/timepoint set, with the GBM and GLMM models properly classifying observations between 70-97% of the time.In both sets, the best prediction occurred at the tails, that is, for the lowest binary classifier (greater/less than one week) and the highest (greater/less than 180 days.)For the GBM and GLMM models, the best performing approaches, we also report the breakdown of classifier performance by donor (S4 and S5 Tables, respectively) and by donor/timepoint (S6 and S7 Tables, respectively.)Overall, these summaries do not indicate any extreme donor or donor/ timepoint outliers, although samples with more observations tended to perform better.

Discussion
The results of this study demonstrate the potential for machine learning-based methods to provide probative TSD estimates of 'touch' biological samples from IFC measurements of cell populations.We tested four separate models across three different machine learning paradigms for prediction and found that GBM and GLMM models outperformed both the LASSO and the ridge regression implementations of the elastic net.This disparity in performance is not wholly unexpected, as ensemble learners such as GBMs often outperform penalized regressions (ie elastic net) in some data, while we postulate that improved performance in the GLMM models was on account of the random effects in that model which take inter-donor covariance into account.Where linear model assumptions hold, it is reasonable to expect that a mixed model will outperform fixed effects only models where correlated observations are present, as they appropriately model the potential non-zero covariance between observations taken from the same subject or donor.In practice, however, it may be not always be possible or practical to identify from whom each touch sample originated, and therefore properly model the covariance.In these cases then, a GBM may be a good proxy, even though no random effect terms are incorporated.
The best performing model (GLMM) showed an overall MAE rate of ~21 days for predicted time since deposition of blinded (or holdout) cell population samples.While this level of uncertainty may seem large for forensic applications, we noted that time discrepancies varied with the TSD of the sample.In particular, cell populations deposited less than one week previously had MAE of 1.1 days (GLMM model), whereas cell populations deposited more than 6 months previously had MAE of 161.9 days.Within a forensic context, because it is relatively common for the prosecution and defense to theorize highly disparate time frames for DNA deposition, these TSD estimates can be quite probative.One side may propose that DNA recovered from a crime scene was deposited at the time of the crime, and the other may claim it was transferred days, weeks, months or years before or after the crime event [23][24][25].As such, even broad TSD estimates may be useful for both forensic laboratories and the legal system in distinguishing relevant from misleading biological material in the course of a criminal investigation.
To investigate how TSD estimates could be applied within this context, we considered an alternative framework for interpreting TSD estimates using a series of binary time intervals; determining whether a sample was deposited greater/less than 7, 30, 60, 90, 120, or 180 days.With this scheme, both GBM and GLMM models were able to properly classify the age of a sample most accurately at the extremes of the TSD range, with over 99% accuracy for categorizing samples as greater/less than one week old and 96% accuracy for categorizing samples as greater/less than 180 days old in the GLMM models predicting in the test set.We further demonstrated that this performance remained strong in the sensitivity analysis testing the model's ability to predict TSD for a blind sample, i.e. one that was held out, to which the training model was naive.In this set of held-out samples, the GLMM model's overall classification accuracy was 97% for categorizing a sample as 7 days old or less and 95% accurate for categorizing as 180 days old or less, in the GLMM model.As data clusters towards the center of the distribution, the ability of the model to delineate properly between classes diminishes slightly with observed accuracies of 91%, 92%, 85%, and 90% for categorizing age of cells as greater/ less than 30, 60, 90, and 120 days, respectively.
While our overall goal with this study was to evaluate different quantitative frameworks for analyzing IFC measurements and explore the potential to estimate TSD for forensically-relevant biological samples, the primary limitation of these results is the moderate sample size employed here.We had 15 independent donors who provided a total of 81 contributor cell populations that were subsequently measured at various timepoints.It is possible that timesince-deposition signatures could be affected by contributor-specific variation in autofluorescence at specific wavelengths, as has been previously described [26,27], and contribute to the misclassification of individual cells or a donor cell-population.Future studies that survey a larger pool of individuals as well as an explicit range of biological attributes (e.g., chronological age, ancestry) can help understand the impact of contributor-specific variation and potentially increase the accuracy and robustness of time-since-deposition signatures.
In addition to the highly varying nature of the timepoints, a related limitation relates to the uneven number of cells detected in each sample which is typical for touch DNA deposits [28].An expanded reference dataset should similarly address these limitations and improve models and the resulting TSD estimates.We expect that results from this proof-of-concept study could serve as the basis for a formal validation effort that tests the robustness of TSD estimation on a larger donor set and, possibly, conditions that can be encountered by caseworking agencies including mixture samples and samples with various degrees of degradation.
Lastly, we selected only four, largely regression-centric machine learning models to test here.This limitation, however, lends itself to important implications for practice.These four machine learning models were selected explicitly because they were non-"black box" in nature, allowing the user to understand the manner in which each independent variable contributes to the ultimate prediction.Such models can be more intuitive and interpretable compared to other computational approaches, such as deep learning which may facilitate its adoption by the forensic science community.Furthermore, we note that our approaches employed continuous regression models.Evaluation of the error structure from these models may indicate that a discrete response model may be warranted.Future work aims to evaluate the utility of such models for this and similar data.
As a final note, the field should consider a change in nomenclature from the "time since deposition" language found in the existing literature, which may be out of step with our current understanding of the realities of direct and indirect transfer of biological material.Because these methods are based on cellular changes over time, these methods are actually predicting time since primary transfer from the biological source.Sometimes this is the same as time since deposition (e.g. if a person touches an object and their cells transfer), but if indirect transfer takes place, it may not.In other words, if cells are transferred from their biological source to one substrate, and then from that substrate to an item at the crime scene at a different point in time-a phenomenon that has been shown to occur in both studies and real life [24,29]-the time predicted should theoretically be associated with the original, or "primary" transfer event (and this would be desired).

Table 1 . Number of observations taken from each donor at each timepoint.
We transformed raw time in days to time in ln(days) for the purpose of machine learning predictions and the input IFC features were centered and scaled according to best practices prior to the regression-based ML approaches.Before applying any machine learning methods to predict time since deposition of touch samples, we divided our data into non-overlapping training and test sets.The test set represents a hold-out set kept in reserve for testing models constructed using the training set and contains points never used in training the models.The number of timepoints collected per donor, as well as the number of observations taken at each timepoint varied across donors.In order to balance the test and training sets, we reserved 20% of the observations from each sample for the test set.We rounded the number of observations in the test set up where 20% of the number of observations did not result in a whole number.Furthermore, in cases where the number of observations in the training set for a given sample exceeded the 90th percentile of observation counts across all donors/timepoints, we capped the number of observations in the training set at the 90th percentile and reserved the remainder for the test set.S1 Table summarizes the number of observations in the test and training sets for each sample.